A Linguistically-Based Segmentation of Complex Sentences

نویسندگان

  • Vladislav Kubon
  • Markéta Lopatková
  • Martin Plátek
  • Patrice Pognan
چکیده

The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units, which may provide a basis for further processing of complex sentences. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It contains a simple set of rules applied for the segmentation of a small set of Czech sentences. The issues of segment annotation based on existing corpus are also mentioned.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmentation of Complex Sentences

The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of wo...

متن کامل

Annotation of sentence structure - Capturing the relationship between clauses in Czech sentences

The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence structure. We show that the concept of linear segments—linguistically motivated units, which may be easily detected automatically—serves as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination, coordin...

متن کامل

Automatic linguistic segmentation of conversational speech

As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level fe...

متن کامل

Identifying linguistic segmentations in Chinese spoken dialogue

In a continuous speech recognition system, a longer waveform is usually segmented into some shorter pieces based on simple acoustic criteria, such as unfilled pauses (i.e., silences). We call such a kind of segmentation as an acoustic segmentation. In general, the acoustic segmentations do not reflect the linguistic structure. They may fragment sentences or semantic units. Besides, they may als...

متن کامل

The Role of Self-Regulatory Approach in Iranian Learners' Lexical Segmentation: The case of authentic materials

The present research investigated the effect of self-regulatory approach (with two components of self-checking and self-efficacy) on pre-intermediate Iranian learners' lexical segmentation in listening comprehension via authentic listening comprehension texts. To achieve this purpose, the investigators administered an Oxford Placement Test (2007) to ninety-eight students of two girls’ private j...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007